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ABSTRACT 

We present a multimoment technique for signal classification and apply it to the detection of fast 
radio transients in incoherently dedispersed data. Specifically, we define a spectral modulation index in 
terms of the fractional variation in intensity across a spectrum. A signal whose intensity is distributed 
evenly across the entire band has a lower modulation index than a spectrum whose intensity is localized 
in a single channel. We are interested in broadband pulses and use the modulation index to excise 
narrowband radio frequency interference (RFI) by applying a modulation index threshold above which 
candidate events are removed. The technique is tested both with simulations and using data from 
known sources of radio pulses (RRAT J1928+15 and giant pulses from the Crab pulsar). The method 
is generalized to coherent dedispersion, image cubes, and astrophysical narrowband signals that are 
steady in time. We suggest that the modulation index, along with other statistics using higher-order 
moments, should be incorporated into signal detection pipelines to characterize and classify signals. 



1. INTRODUCTION 

Surveys have always played an important role in as- 
tronomy and will play an increasingly important role in 
the future as observatories such as the Large Synoptic 
Sky Telescope (LSST) and the Square Kilometer Array 
(SKA) come online. The huge volumes of data gener- 
ated by surveys require robust pipelines to identify and 
characterize populations with maximal completeness and 
minimal false positives. Regardless of the target source 
class, all detection pipelines rely on signal-to-noise ra- 
tio (SNR) to find sources and quantify their believabil- 
ity. This approach underutilizes the spectral informa- 
tion contained in the data, because it only uses the total 
intensity, or first moment, of a spectrum. We propose 
adding a second statistic, the spectral modulation index, 
that uses both the first moment (signal mean) and sec- 
ond moment (signal variance) of a spectrum to classify 
signals found through their SNR. 

Although the technique we present in this paper is ap- 
plicable to any data collected as intensity versus time and 
frequency, we focus on surveys for fast radio bursts and 
use the modulation index to identify and remove nar- 
rowband radio frequency interference (RFI). Surveys at 
radio frequencies must contend with RFI because, if not 
mitigated or excised, it can produce many false positives 
in processing pipelines. One difficulty in removing RFI 
is that it arises from a wide variety of terrestrial sources 
with different signal characteristics. Moreover it can be 
episodic or simply transient in nature along with the as- 
trophysical signals that we are interested in. RFI can 
be broad in time and narrow in frequency (e.g. Global 
Positioning System satellites) or broad in frequency and 
narrow in time (e.g. lightning). Some radar signals sweep 
in frequency and mimic the plasma dispersion of astro- 
physical bursts. 
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We define fast radio bursts as having character- 
istic widths less than about one second, so as- 
trophysical plasma delays, such as those encoun- 
tered in pulsar signals and in the class of tran- 
sients known as rotat ing radio transients (RRATS, 
iMcLaughlin et aLll2006D . are important. When applied 
to fast transients, our technique builds upon the meth- 
ods presented in 'Cordes and McLa ughlinl (|2003f ) and 
IMcLaughlin and Cordes (2003) and that led to the dis- 
covery of RRATs. However, the basic idea applies to 
transients of any duration. 

Actual signals will blur the distinction between RFI 
and signals of interest because some RFI will be broad- 
band and some broadband astrophysical signals will show 
significant frequency variation. Astrophysical sources 
may have a spectral dependence including a simple spec- 
tral index or stronger modulations like those seen in so- 
lar bursts. Compact sources of fast transients will typi- 
cally show frequency modulation from interstellar scintil- 
lation. We consider these effects in our implementation 
of the method. 

The modulation index is not a new statistic; it has 
been used to measure time variations in a variety of 
astro nomical applications, such as variability of pul- 
sars (Weisberg et al." '1986") and extragalactic sources 
ziora-Chudczer ct al. 2001) caused by interstellar 
scintillations. It h as also been used to study properties 
of the solar wind (Spangler and SpitleHl2004l) . 

We lay out the mathematical groundwork for multimo- 
ment dedispersion and the calculation of the modulation 
index in §[2l In §[3] we discuss the implementation of the 
modulation index in a fast transient detection pipeline 
and present the results of a simulated single pulse de- 
tection pipeline. Also in § [3] we apply the technique to 
two known sources of single pulses: Crab giant pulses 
and RRAT J1928-hl5. The method is extended to other 
types of data sets in § [H We discuss how the spectral 
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modulation index can be used to classify bursts from a 
variety of astrophysical sources in § [5] and make conclud- 
ing remarks in § |6l 

2. METHOD 

For specificity we consider broadband astrophysical 
signals that are sampled as dynamic spectra; that is, 
a sequence of spectra separated in time by Atg with 
Ni, frequency channels spanning a total bandwidth B. 
Most of the cases we discuss in this paper will have 
time-bandwidth products well in excess of unity, i.e., 
/SisB/Ny ^ 1, as is consistent with fast-dump spectrom- 
eters used in surveys for pulsars and radio transients. 
Our discussion will also concentrate on incoherent (post- 
detection) dedispersion, although we briefly discuss ap- 
plications where coherent dedispersion is used. To illus- 
trate the basic method, we consider only a simple sum 
over frequency to yield an intensity time series; later we 
will consider interstellar dispersion delays in the sum and 
usage of the dispersion effect in discriminating astrophys- 
ical signals from RFI. 

We define the modulation index m/ as the normalized 
standard deviation of the intensity / across frequency 
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where the first and second moments, / and respec- 
tively, are given by 



(2) 



The modulation index, m/, characterizes the distribu- 
tion of signal power in spectrum. If the power is dis- 
tributed evenly across the band, the variance is small, 
and the spectrum has a low modulation index. A broad- 
band pulse with a flat spectrum and infinite signal-to- 
noise ratio (SNR) has m/ ~ 0. In the opposite extreme 
where the power is localized to single spectral channel, 
the modulation index is mi V — 1 with increas- 
ing SNR. These simple examples illustrate how broad- 
band astrophysical transients can be discriminated from 
RFI, which often consists of narrow spikes in frequency 
accompanied by time variability, by requiring that the 
modulation index be less than some ceiling, mi^max- 

2.1. Multimoment Dedispersion 

The modulation index improves upon current detec- 
tion schemes by including more information about the 
signal through the calculation of both the first and sec- 
ond moments. The signal processing algorithms used 
in the detection of short-duration radio transients (i.e., 
dedispersion and smoothing) must therefore be expanded 
to higher order moments. Although we only use the first 
and second moments in this paper, one could consider 
statistics that use higher order moments (e.g., skewness 
and kurtosis), so we introduce general, n*''-order expres- 
sions. 

Radio pulses traveling through the interstellar medium 
are subject to frequency-dependent dispersion. Standard 
pulsar processing techniques remove the effects of disper- 
sion by shifting the intensity in each frequency channel 



in time according to the h'~^ dispersion relation and av- 
eraging to increase the signal-to-noise ratio (SNR). 

The standard incoherent or post-detection approach 
for calculating a dedispersed time series is 



I{t, DM) = ^ ^ I(t + tuuH, 



(3) 



where /(t, v) is the time-frequency intensity data, Ni, is 
the number of frequency channels, and t£)M{v) is the de- 
lay at frequency v for dispersion measure DM. Through- 
out we will denote an average in frequency with a bar over 
the variable. Survey data must be dedispersed using a 
set of trial DMs because the DM of the target sources are 
not known a priori, except in special applications, such 
as searches of globular clusters with previously known 
pulsars. A list of candidate pulses is defined by apply- 
ing a minimum SNR threshold (SNRmin) to the set of 
dedispersed time series. 

Equation [3] implicitly weights all frequency channels 
equally. In general though there will be effects, both 
astrophysical and instrumental, for which optimal de- 
tection (i.e., maximum SNR) requires unequal weight- 
ing. For example broadband astrophysical sources have 
spectra with slopes characterized by a spectral index a, 
and maximum SNR occurs when the frequency chan- 
nels are given some weighting Wu that reflects a. Sim- 
ilarly removing instrumental effects, like bandpass sub- 
traction, may introduce channel-dependent root-mean- 
square noise. The generalized expression for a weighted, 
dedispersed, first moment time series is 

^w,, [v/v(3)°'l{i + iY:)M{v),v) 

7(t,DM;a) = ^^ — . (4) 

^w, 

V 

A full discussion of frequency weights is presented in Sec- 
tion O 

Optimization in detection can also be made by consid- 
ering the duration of the pulse relative to the time resolu- 
tion of the data. The effective time resolution can be de- 
creased by smoothing the data, and the maximum SNR 
occurs when the effective time r esolution of the smoothed 
data matches that of the pulse (jCordes and McLaughlinI 
l2003f ). The simplest method of smoothing adds adjacent 
time samples until all of the signal's power is in a sin- 
gle sample. In general any smoothing technique can be 
defined by applying smoothing weights Wt to the time- 
frequency data and averaging in time; 



^ Wtt'I(t', J/) 

t' 



(5) 



where Is{t,h') is the smoothed time-frequency data and 
/(t', ly) is the original time-frequency data. A dedis- 
persed, smoothed time series Is{t, DM) is calculated ac- 
cording to Equation 13] substituting Is for /. 

The calculation of the modulation index requires the 
second moment of the intensity time series. We make one 
final generalization by expanding Equations |4] and [5] to 
higher order moments. The weighted, dedispersed n*''- 
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moment time series is 

^ Wi. [{v/vo)"' I(t + t-DMiv),!^)]^ 

7^(i,DM;a) = ^ — . (6) 

The smoothed, dedispersed n*''-momcnt time series is 

^Wtt'I(t' + tDM(l^),i^) 



t' 



•(7) 



Because the spectral modulation index is a measure of 
the frequency structure, and not the temporal structure, 
the data are smoothed first in time, thereby consolidating 
the signal into a single spectrum, before it is squared and 
averaged. The modulation index for smoothed data is 
then 



(8) 



We have specified separate expressions for the dedis- 
persed intensity with frequency weights and smoothing 
weights solely for clarity; an optimal detection scheme 
would employ both. In the remainder of the paper we 
will explore the role of smoothing on the modulation in- 
dex but keep the assumptions that — 1 and a — 0. 

An alternative formulation to calculating the 
smoothed, dedispersed time series is to square the 
data before smoothing: 



In^t,DM) = ^Y.- 



J2 '^'t*' 
t' 



(9) 



This approach captures both the time and frequency 
variation of the data encompassed by the sumations. 
Throughout this paper we will focus on the spectral mod- 
ulation index but will discuss this "time resolved" modu- 
lation index in Section [2.6l when we discuss the signature 
of incorrect dedispersion. 

2.2. Frequency weights 

As discussed in the previous section, weighting the fre- 
quency channels non-uniformly when calculating a dedis- 
persed time series can improve the SNR of the detection. 
In general an astrophysical signal will have a power law 
spectrum 



(10) 



where P^t is the pulse flux density at time t and channel 
ly, Vo is a reference frequency, and Oo is the source's spec- 
tral index. Frequency weights that yield the best SNR 
act to flatten the spectrum 



(11) 



where optimal detection occurs for a = ao- Uniform 
spectral index frequency weights implicitly assumes a flat 



spectral index of a = 0. Pulsars have typical spectral in- 
dices of ao — 1.6 with significant var iation {ao^min — 
and ao^max = 3) ()Lorimer et aLlfTQOSD . Processing with 
an implied spectral index of a = is only ideal for the 
minority of pulsars with the lowest observed spectral in- 
dices. Instead, applying frequency weights correspond- 
ing to the mean pulsar spectral index would increase the 
SNR with little additional computational cost. 

Surveys for new classes of sources where a is unknown 
or surveys searching for extremely weak examples of a 
known population may warrant a search over spectral 
index. Dedispersing data using a set of trial spectral in- 
dices would increase the required computation by a fac- 
tor equal to the number of trial spectral indices. Such a 
search may only be practical for offline post-processing. 

Frequency weights can also reflect frequency- 
dependent noise variations caused by instrumental 
effects. In general we can define a signal model 

lut — buiT^t + gvtPft), 

where hi, is the bandpass shape, Tyt is the system temper- 
ature, g^t is the gain (e.g., K Jy~^) and Pi,t is the pulsar 
or transient flux density. The quantity we are interested 
in is P^n but the quantity we measure is I^t- Isolat- 
ing Pi, requires removing the three frequency-dependent 
instrumental effects, which may introduce frequency- 
dependent rms noise. For example flattening the band- 
pass (6i/) will result in higher noise at the band edges. 
For systems operating at low frequencies 100 MHz) 
and large total bandwidths, the system temperature may 
vary across the band due to the strong frequency depen- 
dence of the sky brightness temperature. Finally gain 
variations will be both time and frequency dependent 
due to the particularities of the instrument. Again one 
chooses Wi, such that SNR is maximized, so if the ad- 
ditive noise can be modeled as Gaussian white noise, 
weighting the channels by the inverse of their variance 
(i.e., Wv oc 1/cr^) results in the maximum SNR. 

To estimate the importance of considering frequency- 
dependent noise variations, we adopted a simple model 
for system temperature T^t = 20K -f 10K(i//i/o)~^'^ and 
a signal with a non-zero spectral spectral index. The re- 
sulting SNR after correcting for the spectral index and 
averaging over frequency was compared to the standard 
case that assumes a flat spectral index. At higher ob- 
serving frequencies (^ 1 GHz) the improvement is small 
(^ 0.1%) for small or large bandwidths because of the low 
sky brightness temperature. At lower observing frequen- 
cies (~ 100 MHz), there is an improvement in SNR by as 
much as ^ 80% for wide bandwidths. This is particularly 
relevant for observatories like the Low Frequency Array 
(LOFAR, de Vo s et al. 200.d) and Murchison Widefield 
Array (MWA. iLoiIsdale et all 120091 ). 

A non-zero spectral index will increase the variance of 
a spectrum and therefore also increase the modulation 
index. This is undesirable because it would mistakenly 
imply a lower filling factor and may result in a true sig- 
nal being flagged as RFI. The magnitude of the increase 
depends on the observation frequency {vo)^ bandwidth 
{B), and spectral index. For B/vo ^ 0.1 the modulation 
index increase is of order ~ 0.1 for a ~ 3. In the extreme 
case of B/vo ^ 1 , the modulation increase is of order ^ 1 
for a ~ 3. Current instruments have B /vq ~ 0.1, so the 
increase in modulation index due to a source's spectral 
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index is negligible. The trend is for new instruments to 
have larger bandwidths, so eventually the modulation in- 
crease will be significant enough that correcting for the 
spectral index becomes necessary. Furthermore in sur- 
veys that employ a spectral index search, the modula- 
tion index aids the analysis, because the trial spectral 
index closest to the true spectral index has the lowest 
modulation index. 

2.3. Modulation Index 

The modulation index is a quantitative measure of the 
patchiness, or modulation, of a spectrum. It differen- 
tiates signals whose power is distributed evenly across 
the band from those whose power is isolated to a few 
frequency channels. Broadband and narrowband sig- 
nals have small and large modulation indices respec- 
tively. The level of modulation is parametrized by a 
frequency filling factor f^, = Wy/N^ where is the 
number of channels in the spectrum that contains signal. 
Broadband astrophysical signals have a high filling fac- 
tor {fi, — 1). RFI can be both narrowband or broadband 
with a filling factor ranging from fi, = l/N^ to 1. As this 
paper focuses on detecting broadband signals, the ulti- 
mate goal is to define a modulation index cutoff, mi^niax, 
that will enable us to fiag signals with low filling factors 
and vastly reduce the number of candidates created by a 
signal detection pipeline. 

The analysis below assumes that the data have been 
searched for candidate signals by requiring a candidate 
to have an SNRt larger than a minimum SNR (SNRmin), 
and the modulation index is only calculated for these 
candidate signals. Because we assume that our data has 
zero mean, either through construction in the case of sim- 
ulated data or through bandpass subtraction in the case 
of real data, this assumption assures that / > and mi 
is well-defined. Furthermore, our interpretation of mj as- 
sumes SNRmin > 1, which is reasonable assumption since 
such a low SNRinin would result in a deluge of events. 

To derive a simple, analytical expression for the depen- 
dence of mi on we consider a spectrum with N^, chan- 
nels that contains two components: noise and signal. The 
noise is assumed to have a mean of zero and variance cr^ . 
The signal fills Wi, channels with intensity Ai in channel 
i. While each Ai may be different, it will be useful to 
define A, the average signal intensity over Wi, channels. 
The average signal intensity over the entire band is fi,A. 
As we are usually more interested in signal-to-noise ratios 
(SNR), we define a single-channel SNR, SNR,,t = A/cr^, 
and time series SNR, SNRt = ^/N;Jfl.SNR^t• The latter 
expression assumes that the standard deviation of the 
noise in the times series is a^^j \J N^. 

The modulation index for this simple signal model is 
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where we have introduced a separate signal modulation 
index, ttia = ap^l A. This modulation index characterizes 
the inherent frequency structure of a signal, and for this 
initial discussion we consider only signals with negligible 
structure (ttia ^1). In Sections 12.41 and 12.51 we discuss 
the effects of interstellar scintillation and pulsar self-noise 
and will introduce a non-zero mA- 
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Figure 1. Time series SNR vs. modulation index for a simulated 
datasct with N-u = 256 containing noise (open circles), dispersed 
pulses (filled circles), and two sets of one-channel- wide RFI spikes 
(open squares). The solid horizontal line represents the applied 
intensity threshold of SNRj^in = 3. The vertical dashed line rep- 
resents mj x = 5.3 as calculated for SNRt = 3 using Equation 1141 
The dotted curve shows mj as a function of SNRt (Equation ll3ll . 

To explore the behavior of the modulation index as a 
function of SNRt and /i^, we will look at the extrema 
of /i^: fv — ^ and f„ — l/Ni^. It is important to note 
that Equation [T^ as well as the limiting expressions de- 
fined below, represents the ensemble average values of 
the modulation index. Statistically they are the average 
value expected for a given combinations of SNRt and fi,, 
but noise in the data will cause variation in the actual 
values. The expressions below are also idealized in so far 
as we have set toa = 0. 

We also simulated a time-frequency data set contain- 
ing broadband and ultra-narrowband signals added to 
Gaussian noise to accompany the discussion. It was 
processed according to standard pulsar processing tech- 
niques, and the results are shown in Figure [T] The data 
set has N^, = 256 and includes 100 broadband pulses 
(/^ = 1) with Wt = 1 and SNRt 10, two sets of 100 
ultra-narrowband spikes (/^ = l/^i^) with Wt — 1 and 
SNRt = 5 and SNRt = 10 respectively, and 10*^ noise- 
only spectra with zero mean and — 1. We ignored 
dispersion for simplicity, and the time series was calcu- 
lated using Equation [3] with tDM('^) = 0. An intensity 
threshold was applied to the resulting time series with 
SNRmin = 3 (solid horizontal line), and the modulation 
index was calculated for samples above threshold. The 
vertical dashed line and dashed curve are explained be- 
low. 

A broadband signal {fi, — 1) with no inherent struc- 
ture does not increase a spectrum's variance, so the mod- 
ulation index depends only on the number of frequency 
channels and the signal's SNRt 



mihh 



SNRt 



(13) 



Throughout we will see that the number of frequency 
channels scales the modulation index but does not change 
the relative magnitude (i.e., signals with larger have 
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lower mi). As the number of channels from a single in- 
strument generally remains constant, it is unimportant 
to the interpretation of a single data set but is impor- 
tant when comparing data from different instruments 
and choosing an appropriate mj max- 

Equation[T3]reveals a direct relationship between SNRt 
and mi. The modulation index of a broadband pulse 
is not arbitrary; rather, it must fall along a curve pro- 
portional to 1/SNRt. Furthermore for a signal with a 
given SNRt , the broadband modulation index is the low- 
est mi the signal may have on average, as any < 1 
increases the frequency modulation of a signal and corre- 
spondingly its modulation index. In Figure [T] the broad- 
band pulses (closed circles) cluster in a stripe centered 
at SNRt = 10 and mi^bb = 1-6. While each pulse has 
an inherent SNRt = 10, the underlying noise in the data 
spreads the SNRt and mi of individual realizations about 
the ensemble average value (mi^bb) and along the curve 
given by Equation [T3] (dotted curve). 

In practice there is a special SNRt: SNRmin, the min- 
imum SNR constraint applied to the time series in the 
"thresholding" operation. The corresponding modula- 
tion index is given by Equation [T3l 
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This modulation index represents the average maximum 
™i,bb that may exist in a set of thresholded candidates. 
In real data there will be spectra with SNR above the 
SNRmin due to noise alone. The modulation index of 
thresholded noise follows the same relation as broadband 
pulses but with SNRt ^ SNRmin. Most of the events due 
to noise will cluster near mi^x and SNRmin, but the rare 
stronger noise events will pepper the broadband curve 
toward larger SNRt and lower m/. In Figure [T] thresh- 
olded noise is shown as open circles, and only 100 points 
are plotted to reduce clutter. The weakest noise clusters 
near SNRmin — 3 and mi^x = 5.3, and the stronger noise 
climbs the mi oc 1/SNRt curve (dotted curve). 

Because toi,t is on average the largest value of mod- 
ulation index for a broadband signal in a candidate list, 
it is an upper limit to the choice of mi^max- Choosing 
a wi^max > 'm'i,T would only return events from thresh- 
olded noise or RFI. The dashed vertical line in Figure [1] 
shows mi^T for SNRmin = 3. If a modulation index upper 
limit is applied at mi^max — ™i,t, events to the left of 
the line are kept (all of the broadband pulses and about 
85% of the thresholded noise), while all events to the 
right are dropped (all the ultra-narrowband spikes and 
about 15% of noise events). A larger SNRmin would raise 
the solid line and move the dotted line to the left, which 
reduces the false alarm rate due to noise but limits one 
to detecting stronger pulses that are presumably rarer. 

For the ultra-narrowband case where f^, = 1/Ny and 
SNRt — cx), the dependence on SNRt drops out, and the 
average modulation reduces to 



mi., 



= .JN, ~ 1. 
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Note that the modulation index for spiky signals depends 
only on the number of channels. Equation 1151 also gives 
the upper limit on to/ as any /,y > 1 /iV^ reduces the mod- 
ulation and decreases the modulation index. Spiky RFI 
is illustrated in Figure [T] as two stripes of open squares 




Figure 2. Modulation index vs. filling factor calculated using 
Equation 1121 with Nv = 256. From top to bottom (thin to thick), 
curves are shown for four values of SNRt: 3, 5, 10, 100. 

centered at SNRt — 10 and 5 respectively and mi g — 16. 
Like the broadband signals, the noise spreads the points 
about the ensemble average value, but clearly mi^g does 
not depend on SNRt for narrowband signals. 

The modulation index for intermediate filling factors at 
constant SNRt must transition smoothly from mi ^b to 
mi s as fu goes from 1 to 1/iV^. The exact manner of the 
transition is given by Equation [T^l Figure [5] illustrates 
the analytical variation of modulation index with filling 
factor for four values of the time series SNR ranging from 
SNRt = 3 (top) to SNRt = 100 (bottom) and = 256. 
At low filling factors all curves tend toward \/Ny^ and at 
high filling factors the stronger the signal, the lower the 
modulation index. Most of the drop in modulation index 
happens at < 0.1, suggesting that this technique can 
easily classify signals with filling factors less than about 
10% but is less sensitive for signals with moderate to high 
filling factors. 

In the above analysis we have assumed that a signal is 
either narrowband or broadband. Reality is messier, and 
we might have overlapping signals that are both narrow- 
band and broadband. For example a pulse might occur 
at the same time as persistent narrowband RFI. To ex- 
plore this case we adopt a simple model looking at the 
modulation index of a broadband pulse with no intrinsic 
frequency structure contaminated by an RFI spike that 
is one channel wide (i.e. the "ultra-narroband" case de- 
scribed above). Equation [T51 estimates the modulation 
index for this two-component case: 
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where SNRt_bb and SNRt_s are the time series SNR of 
the broadband pulse and RFI spike respectively. Note 
when SNRt^s — > 0, the above equation reduces to the ex- 
pression for broadband pulses (Equation I13|) , and when 
SNRt^bb — > 0, the above equation reduces to the expres- 
sion for ultra- narrowband spikes fEquation ll5[ up to the 
"—1" , which is negligible for large N^). When SNRt^bb ~ 
SNRt^s, we see that mi^bb+s ~ 0.5mi^s- If we choose 
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the modulation index cutoff to be toi,t, this imphes a 
SNRmiii < 2 to allow mi_bb+s to be above threshold, 
which is an unreasonably low threshold for most surveys. 
Furthermore using Equation [161 can estimate that in 
order for mj^bb+s > 'rni,T, SNRt^bb/SNRt^s > SNRmm-l- 
For example if SNRmin = 5, the SNRt of the pulse must 
be four times larger than the SNRt of the narrowband 
RFI. Using only the SNRt and modulation index, we 
could easily miss a pulse if it occurs concurrent with 
strong, narrowband RFI. This finding that the modula- 
tion index is more sensitive to narrowband signals than 
broadband signals is consistent with Figure [5] and sug- 
gests our method is not a substitute for RFI excision 
techniques that identify persistent, strong, narrowband 
RFI from raw data. 

2.4. Interstellar scintillations 

Small scale density irregularities in the ionized inter- 
stellar medium (ISM) scatter and refract radio waves. 
Diffractive interstellar scintillations (DISS) and refrac- 
tive interstellar scintillations (RISS) are observational 
phenomena seen in compact radio sources due to these 
irregularities. Scintillations are characterized by inten- 
sity variations with typical time (A^diss) smd frequency 
scales (Ai^Diss): a-nd for DISS in the strong scatter- 
ing regime, a time series is modulated as a random 
variable guiss{t) with an exponential amplitude distri- 
bution and fractional intensity variations on the order 
of unity. More generally, DISS varies with both time 
and frequency {gBissit, i^)) with the diffraction timescale 
scaling as A<diss ~ '^^'^ and bandwidth scaling as 



Az^Diss 



Also note that Ai/diss and the pulse 



broadening time (rd) are Fo urier transform pairs and 
given by 27rAi'DissTd = Ci (jCordes and RicketH Il998| ) 
where Ci « 1. For fast transients we can safely as- 
sume that the duration of the pulse is much less than the 
diffraction time scale (at least for DM < a few hundreds 
at = 1 GHz), whereas the scintillation bandwidth may 
be of the same order as the channel resolution or band- 
width of a spectrometer. Frequency structure caused by 
DISS will increase the variance and thereby increase the 
modulation index of a signal. 

Understanding the frequency structure of a spectrum 
due to strong scattering is clarified by defining three lim- 
its based on the relative sizes of Av, B, and Ai^diss 
where Av is the wid th of a single frequency channel 
in a spectrum (Cordes "et al.l l2004[ ). When the scin- 
tillation bandwidth is smaller than the channel band- 
width (Az^Diss Av), the instrument effectively aver- 
ages over several "scintles" and the intensity modulations 
are quenched. In this case m\ <^ 1, and the modula- 
tion index follows the expressions in Section 12.31 Simi- 
larly in the other extreme where there is only one scintle 
across the band (0.2B < Ai^diss): the intensity variation 
is nearly flat over the entire bandwidth and niA ^ 1. 
For the intermediate case where there are several dis- 
tinct scintles across the band {Av < Ar^Diss ^ 0.2_B), 
the amplitude of each scintle is exponentially distributed 
and mA ~ 1. As instruments become increasingly wide 
band and B/v ^ 1, this situation will become ever more 
relevant. Ex a mples of these three cases can be seen in 
iCordes et al.l (|2004[ ) for giant pulses from the Crab pul- 
sar. 

Connecting this to the discussion in Section [221 the re- 



sulting average modulation index of a spectrum contain- 
ing Gaussian noise and a broadband signal with expo- 
nentially distributed amplitudes is given by Equation [12] 
with = 1 and toa = 1- For weak signals the leading 
term of Equation 1 1 2 1 dominates and the spectral modula- 
tion index increases only slightly over that for a perfectly 
flat signal (i.e. Equation [T5)) . For example, the modula- 
tion index of a spectrum with N^, — 256 and SNRmin = 5 
increases from mi = 3.2 to mi — 3.35 . For strong signals 
(i.e. SNRt » -s/NV) the first term in EquationfT^becomes 
negligible and mi mA ~ 1 for all SNRt ^ -y/NT". 

2.5. Self Noise in the Pulsar Signal 

Broadband pulsar si gnals have been mod eled as ampli- 
tude modulated noise (IRickett et al. 1975 ) and as mod 



ulate d, polarized shot noise (jCordesI [19761 : ICordes et al.l 
I200I . The noise in these models corresponds to the 
emission over a broad range of radio frequencies while 
the modulation accounts for pulse structure. In this con- 
text, the modulation index of the signal is nonzero but is 
still smaller than the modulation indices expected from 
RFI. In the limit where the noise has Gaussian statistics, 
the intensity modulation index of polarized noise is 
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where miss is the modulation index of frequency struc- 
ture from DISS and dp is the degree of polarization. The 
largest modulation is for miss = dp = \ when m/ = \/3- 
The observed modulation will be reduced if frequency 
structure from DISS is much broader than the total band- 
width B or if multiple pulse structures are averaged over 
in a single frequency channel of the spectrometer. 

2.6. Signature of Incorrect Dedispersion 

A bright pulse in a survey dedispersed with a large 
number of trial dispersion measures will yield events at 
many incorrect DMs in addition to the correct one. The 
true DM will return the largest SNRt and narrowest 
pulse width on average. The residual pulse smearing 
from neighboring, incorrect DMs yields a lower SNRt 
and wider pulse. The larger the DM error, the smaller 
the SNR is on average until the SNR drops below the 
threshold. This SNR-DM signature is one of the tests 
that a signal is a true astrophysical pulse and not RFI. 
The modulation index of a pulse also increases as the 
DM error increases. 

The two-dimensional filling factor for a dispersed pulse 
dedispersed with a DM error of SDM is 
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where jAt^Divil is the absolute value of the residual dis- 
persion smearing and Atp is the intrinsic width of the 
pulse in seconds. For a pulse that is perfectly dedis- 
persed, AtsBM = and f,y^t = 1, while for a large DM 
error, Atg^M — ^ 00 and f^^t — )■ 0. 

Ideal matched filtering of a dispersed pulse, either in 
the absence of dedispersion or from residual smearing 
from incorrect dedispersion, smooths over the duration 
of the pulse's dispersion sweep, consolidating the signal 
into a single time bin. The signal has a lower SNRt 
than the pulse's intrinsic SNRt because f^^t < 1, but the 
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one-dimensional filling factor is still = I. The spec- 
tral modulation index increases slightly due to the drop 
in SNRt (Equation [T3)) . For example, a hypothetical, 
unresolved Crab giant pulse detected in the data set de- 
scribed in Section r3.4.2l with an intrinsic SNRt = 30 has 
mi = 0.75 when dedispersed using the true DM of the 
Crab pulsar (DM = 56.71 pc cm~^) and mi ~ 4.3 when 
dedispersed with a DM error of 5DM = 2 pc cm~^. 

The spectral modulation index does reflect the sig- 
nature of incorrect dedispersion but only indirectly. A 
more powerful statistic would depend on f^^t rather than 
This is accomplished by calculating an alternative 
modulation index {rrii^r) using the time resolved inten- 
sity moments given by Equation [9l For the same hypo- 
thetical Crab giant pulse described above, the resolved 
modulation index is the same as the spectral modula- 
tion index (mi = 0.75) when the pulse is correctly dedis- 
persed, but it increases to mj^r = 24.7 for a DM error of 
SBM = 2 pc cm~3. 

Combining the information provided by the spectral 
modulation index and resolved modulation index helps 
to identify spurious events from incorrect dedispersion. 
By first applying a cutoff in spectral modulation in- 
dex, events are classified as either broadband or narrow- 
band in frequency. The resolved modulation index of the 
broadband signals distinguish those that are also broad 
in time from those that are narrow in time. 

2.7. Modulation Index Cutoff 

The role of the spectral modulation index in a source 
detection pipeline is analogous to the SNR threshold 
(SNRmin)- A signal is first classified as interesting or not 
based on its SNR. For candidate signals that are strong 
enough, the application of a spectral modulation index 
cutoff (mi^max) classifics the candidate as interesting or 
not based on how broadband it is. Just as SNRmin is ap- 
plied to data automatically, the modulation index cutoff 
can be applied without human supervision. 

The choice of mi.,„ax is influenced by the characteristics 
of the instrumentation and pipeline, as well as the im- 
portance of astrophysical effects such as diffractive inter- 
stellar scintillations. The upper limit to mi^max is given 
by modulation index at the SNR threshold (mi^x)- As 
explained in Section 12. 3[ mi^x is the largest modulation 
index on average for a broadband pulse with a perfectly 
flat spectrum in a survey with an applied SNR mini- 
mum SNRiniii fEguation [Til) . Choosing mi^max > mi.T 
only yields candidates from thresholded noise or narrow- 
band RFI. One exception is astrophysical signals with 
frequency structure, such as that caused by DISS. But 
as we showed in Section [2.41 the increase in the modula- 
tion index for weak signals is small (^ a few percent), and 
one could increase mi^max to allow for weak, scintillating 
signals with minimal increase in false positives. 

Choosing mi^max < "m-i.T will reduce the number of 
false positives caused by thresholded noise and incor- 
rectly dedispersed pulses, because it effectively applies 
a larger SNRmin fEguation [T^. The cost is reduced sen- 
sitivity, as on average only pulses with SNRt exceeding 
than the larger effective SNRmin will be below mi^max- 
But for surveys where interstellar scintillation may be im- 
portant, a hard lower limit to the modulation index cut- 
off is set by the nature of exponential statistics; namely. 



?7ii,max = 1, as described in Section [IM 

Real signals will not always cleanly divide into broad- 
band or narrowband. For example RFI could be 
marginally broadband, a real pulse could have strong, 
narrowband structure, or both narrowband and broad- 
band signals could occur simultaneously. But for fast 
transients we also have an additional parameter: disper- 
sion. Looking at the peak dispersion measure of a candi- 
date signal can break the degeneracy between RFI and 
an astrophysical signal with the same modulation index. 

2.8. Correlation Bandwidth 

While the modulation index provides information 
about the degree of modulation in a spectrum, it does not 
provide any information about the shape of the modula- 
tion. A signal with amplitude A localized in N adjacent 
bins has the same variance as a signal with the same 
amplitude whose power is distributed in N isolated bins 
across the band. We therefore need a new measure that 
reflects the distribution of a signal in frequency. 

We define the characteristic bandwidth (Be) to be the 
typical width of the frequency structure of signal. A 
broadband signal has B^ = B, and a narrowband sig- 
nal has Be ~ Ai/. We define the fractional correlation 
bandwidth as a measure of the typical correlation scale 
compared to the total bandwidth 

FCB = ^. (19) 

The Be can be estimated by calculating the correlation 
length from the autocorrelation (ACF) of a spectrum, 
which we define to be the half width half maximum 
(HWHM) of the first lobe of the ACF. 

Figure [3] illustrates the technique for two simulated 
spectra with the same total intensity and variance. The 
left spectrum contains a Gaussian-shaped signal with a 
FWHM of 32 frequency channels. The right spectrum 
was generated by randomly swapping the channels in the 
left spectrum in groups of four, so that both spectra have 
the same total intensity, variance, and modulation index 
(mi ss 1.5). The bottom panels show the ACFs for the 
two spectra. To have the zero lag of the ACF equal to 

mj, the ACF was scaled and offset by ACF/(iV^.I^) - 1, 
where / is the mean of the spectrum. For the Gaus- 
sian spectrum, FCB ~ 0.18 and is consistent with the 
autocorrelation of a Gaussian with a FWIIM=32 chan- 
nels. For the spiky spectrum, FCB — 0.03 is consistent 
with a signal with spikes approximately four channels 
wide. Both spectra could be interesting astrophysical 
signals. The left signal could clearly be caused by an as- 
trophysical process. The right signal could be indicative 
of strong interstellar scintillations. In any case the frac- 
tional correlation bandwidth provides another parameter 
that can be used to automatically classify a signal. Note 
the values for the fractional correlation bandwidth were 
calculated automatically along with the ACF and other 
statistics, suggesting the FCB could be implemented in 
an unsupervised pipeline. 

3. APPLICATION 

This section applies the techniques described above to 
the detection of fast radio transients. First we discuss 



8 




32 64 96 128 32 64 96 128 

Frequency Channel Frequency Channel 




32 64 96 128 32 64 96 128 



Frequency Lag Frequency Lag 

Figure 3. Fractional correlation bandwidth (FCB) for two exam- 
ple spectra. The top panels show two simulated spectra with the 
same total intensity and variance and Ny = 128. The left spectrum 
contains a Gaussian with a FWHM frequency width of 32 channels, 
and the right spectrum is a spiky spectrum generated by randomly 
swapping groups of four channels in the left spectrum. The au- 
tocorrelation functions (ACF) of the sample spectra are shown in 
the bottom panels. The ACF has been normalized by the square 
of the mean of the spectrum and the number of channels all minus 
1. The characteristic bandwidth is calculated as the half width at 
half maximum of the first lobe of the ACF, and the corresponding 
FCB is given in the upper right corner of the lower panels. 

one approach of incorporating the calculation of the spec- 
tral modulation index into a transient detection pipeline. 
To illustrate the technique we simulate a transient de- 
tection pipeline and apply the detection scheme to real 
data containing known transients. Our simulations and 
applications to real data show that a modulation index 
cutoff efficiently flags RFI and significantly reduces the 
number of candidates. Note our implementation starts 
with a list of pulse candidates generated in the standard 
manner and not on the raw data directly. A discussion 
of how the modulation index could be used to flag raw 
data in real time is discussed in Section 14.41 Also note 
our technique is independent of the number of beams 
(e.g. Arecibo L-band Feed Arra:^0 ) or stations (e.g. Very 
Long Baseline Array, [Thompson et al.ll20Tll ) used to col- 
lect the data. 

3.1. Implementation in a Detection Pipeline 

The pipeline makes two passes over the data. The first 
pass defines a list of candidate events, and the second 
pass calculates the second moment and modulation index 
of these events. This two-pass approach is adopted for 
practicality and flexibility and will be explained in detail 
below. Also recall that the modulation index requires 
the first and second central moments, so the data must 
be bandpass subtracted before / and P are calculated. 

The data are first dedispersed as described by Equa- 
tion [3] to generate a first moment time series. Surveys 
use a range of trial dispersion measures, each generat- 
ing its own time series. These dedispersed time series 
are smoothed in time by applying a template bank of 
matched filters with different properties to account for 

^ |http: //www.nalc . edu/alf a| 



pulses of different widths and profiles. A list of can- 
didates is defined by applying a minimum SNR thresh- 
old (SNRmin) to these smoothed, dedispersed time series. 
In this paper we focus on two specific implementations 
of matched filtering: boxcar smoothing and clustering 
(friends-of- friends) . 

Boxcar smoothing convolves the data with a box- 
car function of length Wt samples. The correspond- 
ing smoothing weights for Equation [5] are Wtf = 1 for 
i' = 0, Wt- In practice boxcar matched filtering is im- 
plemented by iteratively summing adjacent time samples 
so Wt = 2"="" where is the number of smoothing 
ite rations. The details of t his te chnique are described 
in iCordes and McLaughlinI ()2003l ) . A single signal will 
likely be detected at several values of n^m , so the event 
list should be sifted for the boxcar width that yields the 
maximum SNR. 

The cluster, or friends-of-friends, algorithm looks for 
groupings of events in a time series. A cluster is defined 
as a set of events for which there is no gap in samples 
larger than Ngap- For a cluster with N cluster samples, 
the smoothing weights are wtv = 1 for t' = ti,...,t]\[ 
where ti is the first sample in a cluster, is the last 
sample, and t'^^i ~ t'^ < Ngap + 1. Importantly, the clus- 
ter algorithm is agnostic to the symmetry of the pulse, 
unlike the boxcar smoothing filter which is symmetric. 
As pulses from highly scattered sources have exponential 
tails, this algorithm may be better suited to detecting 
such astrophysical objects. 

After a list of possible candidates is defined, one goes 
back to the raw time-frequency data to calculate the sec- 
ond moments and modulation indices. We grab a narrow 
range of raw data centered at the location of the can- 
didate and reprocess it. The time- frequency snapshot 
is dedispersed and smoothed at the DM and smoothing 
parameters determined from the first pass. A time se- 
ries is calculated for both the first and second moments. 
The first moment time series is thresholded in intensity, 
and the modulation index is calculated for the samples 
above threshold. The modulation indices of the events 
are compared to the modulation index cutoff, mi^max, 
and fiagged as either a signal of interest if m/ < mi^max 
or RFI if mi > mi^max- 

The two-pass approach is adopted out of practical con- 
siderations. While the first moment of the dedispersed 
time series can be smoothed directly, the time-frequency 
data must be smoothed before being squared and av- 
eraged in frequency (Equation [7|) . This would require a 
different dedispersed, smoothed, second moment time se- 
ries for each matched filter type and parameter. Further- 
more some smoothing approaches, such as the cluster al- 
gorithm, determine the smoothing weights Wtt' from the 
thresholded first moment time series, making a parallel 
calculation of the second moment time series impractical. 

3.2. Processing Requirements 

For most cases the two pass approach is more com- 
putationally efficient than the obvious alternative of cal- 
culating the second moment in parallel with the first. 
We parameterize the processing required by the second 
pass in terms of the processing required to do the dedis- 
persion in the first pass. Generally dedispersion domi- 
nates the processing time in a transient survey, so this 
is a useful metric. The number of operations required to 
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dedisperse a block of time-frequency data with A't time 
samples and N^, frequency channels with -/Vdm trial dis- 
persion measures is A'^ops,! = Nt x N,^ x Num- If our 
first pass generates A^events candidate events, the number 
of operations required to dedisperse a narrow time range 
with Ns samples around each event at a single DM is 
A'^ops,2 — 4:Ncvcnts xNgXN^. The factor of 4 was included 
to consider the squaring of the data, summing of both 
the original and squared data and bandpass subtraction. 
The processing required by the second pass normalized 
by the dedispersion processing of the first pass is 

If Nt = 10^, iVoM = 10^, Ns = 103, TVevents = 10^ P2 

is a few percent of the original dedispersion processing. 
This analysis ignores the additional overhead incurred by 
returning to the raw data, and in particular one must be 
wary of excess file I/O. 

Most RFI excision techniques operate on raw data, not 
on the list of candidate pulses. Our technique is more 
general, because it can be used at both the beginning 
(see Section |44)) and end of a source detection pipeline. 
The most common RFI excision approach applied to a 
list of candidates is to remove all events at low dispersion 
measure under the assumption that terrestrial signals are 
not dispersed. This is a blunt instrument and does not 
remove events from RFI at higher DMs. Calculating the 
modulation index of the candidate events allows for more 
sophisticated RFI excision in a list of candidates. 

Classifying signals with the modulation index should 
be used in conjunction with other RFI excision algo- 
rithms. As described in Section 12.31 the modulation in- 
dex is more sensitive to narrowband signals, and weak 
pulses may be missed if they occur simultaneously with 
strong, narrowband RFI. This suggests the modulation 
index algorithm works best together with algorithms that 
remove channels that contain persistent RFI. Similarly 
broadband RFI has a low modulation index, so remov- 
ing impulsive RFI through other means will reduce the 
number of events from RFI that fall below the modula- 
tion index cutoff. 

3.3. Simulations 

To assess the usefulness of the modulation index as a 
signal diagnostic, we simulated a single-pulse event de- 
tection pipeline using Python-based software. In our 
simulations fake time-frequency data can be generated 
with Gaussian-distributed noise, dispersed pulses, and 
a variety of RFI. A dispersed pulse is added with a 
specified dispersion measure and Gaussian pulse profile 
with a FWHM of Wt- Spiky RFI is modeled as a two- 
dimensional Gaussian with a FWHM width in time (Wt) 
and frequency (Wi,). Broadband RFI is modeled as an 
undispersed pulse (i.e., DM=0 pc cm^'^). The simulated 
data then undergo single pulse search processing. First 
the time-frequency data are dedispersed over a range of 
trial dispersion measures and both / and is calculated 
from the dedispersed time series. / is thresholded, and 
for samples that are above threshold, the modulation in- 
dex is calculated. Note that because these simulations 
involve a small amount of data, the two-pass analysis as 
described in Section IXTl is not necessary, because all in- 



termediate data products (i.e., dedispersed time series) 
can be kept in memory. 

The fake data shown in panel (a) of Figure |4] set con- 
tains a single dispersed pulse, a Gaussian RFI spike, 
and a broadband RFI spike. The data properties are 
= 256, Ats = 1 ms, Nt = 2000 time samples, 
i^o = 1400 MHz, and B = 100 MHz. A single dispersed 
pulse was added at t — 0.25 s with DM = 500 pc cm~'^, 
SNR^t = Ij and Wt — I. A narrowband Gaussian 
spike was added at t = 1.0 s and Vo ~ 1428 MHz with 
SNR^t = 40, Wt = 2, and VFj. = 2. Finally a broadband 
RFI spike is represented by an undispersed pulse (i.e., 
DM = pc cm-3) at t = 1.5 s with SNR^t = 5 and 
Wt = 1 . The data were dedispersed over a range of trial 
dispersion measures DM = — 1000 pc cm~'^ and DM in- 
terval ADM = 6 pc cm~'^. A list of candidate event was 
defined by applying an SNR threshold of SNRmin = 3 to 
the resulting time series. 

Figure m shows the results of this simulation. The top 
frame shows (a) the time-frequency data, (b) candidate 
events vs. DM and time, (c) 50/mi for the candidate 
events, and (d) intensity vs. DM and time for candi- 
dates below mi^inax = 3.2. For clarity the SNR and pulse 
widths of the three signals in panel (a) have been exagger- 
ated, and the noise is not shown. Instead of plotting the 
intensity, panel (b) shows a dot for each sample that was 
above the intensity threshold to reduce clutter. Panel (c) 
shows that most of the events have similar modulation 
indices; the one exception is for the broadband pulse, 
which has a modulation index almost an order of mag- 
nitude lower than all other points due to its high SNR. 
Panel (d) plots the SNR of the events with modulation 
indices below mi^max with the area of the circle propor- 
tional to the SNR of the event. 

The triangle-shaped group of events near t ^ 1.5 s 
in panels (b) and (c) are spurious hits caused by the 
dedispersion path crossing the bright, broadband RFI 
samples. Most of these events have low filling factors be- 
cause the RFI samples contribute only a few samples to 
the dedispersed spectrum and are not present in panel 
(d). The exception is low DM and t = 1.5 s where signal 
from the RFI contributes to many frequency channels 
resulting in a low modulation index. This is the incor- 
rect dedispersion effect described in Section 12.61 While 
this example of RFI does pass our modulation index fil- 
ter, the low DM of the event exposes it as RFI. Ap- 
plying yet another filter that removes events at low DM 
will remove such events. The narrowband Gaussian spike 
causes a stripe of spurious events between t « 0.9 — 1.0 s 
as the path of the each trial dispersion measure crosses 
the spike. But for reasons just described above, these 
spurious candidates have modulation indices above our 
threshold. Finally the true astrophysical dispersed pulse, 
which is buried in the middle panels of Figure IH stands 
out in panel (d) at t = 0.25 s after applying the modula- 
tion index cutoff. 

The bottom frame in Figure 0] plots SNRt vs. mj and a 
histogram of mi to illustrate how the modulation index 
groups signal types. The spurious events caused by the 
Gaussian spike (medium gray) are clumped together on 
the far right of the plot between 8 < toi < 20, consis- 
tent with the extreme narrowband case given by Equa- 
tion [15] (mi^s = 16). The events due to the dispersed 
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pulse and broadband RFI follow the light gray and black 
tracks respectively. The tracks are caused by incorrectly 
dedispersing the signal, and in both cases the lowest 
modulation index corresponds to the correct dispersion 
measure. The events from thresholded noise (black) are 
clumped near toi^t 5 and SNRt ^ SNRmin as predicted 
by Equation [T31 Because the weaker events associated 
with the broadband RFI overlap with the area contain- 
ing thresholded noise, we've plotted them with the same 
color. The histogram of mi in the lower panel shows that 
by applying a mi^max = 3.2, we have eliminated most of 
the events. 

3.4. Application to Data 

In the next two subsections we apply our modulation 
index classification technique to two known sources of 
single pulses detectable by single-pulse search pipelines. 
In Section lHXTIwe a pply the method to RRAT J1928-I-15 
and in Section l3.4.2l to giant pulses from the Crab pulsar. 
Although in both cases we know the correct dispersion 
measure of the source, we re-analyze the data at a range 
of trial dispersion measures to recreate typical survey 
results. 

The data were bandpass corrected by dividing by the 
median bandpass and subtracting off the mean. Dividing 
by the median spectrum flattens the spectra, assuring the 
bandpass shape is not contributing to the variance calcu- 
lation. We choose the median because it is less sensitive 
to extreme values caused by pulses or RFI. Subtracting 
the mean assures us that our spectra have zero mean. 

3.4.1. RRATs 

RRA T J1928-fl5 was discovered by iDeneva et~all 
(|2009f) in the PAL FA survey running at the Arecibo 
Observatory. (See IDeneva et al.l 120091 for the details 
of the observation and PALFA parameters.) In brief, 
J1928-I-15 was discovered using the single-pulse search 
algorithms implemented by the Cornell pulsar search 
pipeline. Three pulses were detected with an interval 
of 0.403 s at DM = 242 pc cm"^. We reprocessed the 
bandpass corrected data with 642 trial dispersion mea- 
sures roughly equally spaced over a DM range DM — 
— 500 pc cm^'^. The dedispersed time series had an 
intensity threshold applied at SNR^in = 4. The samples 
above the SNR threshold were run through the cluster 
algorithm, and the sample in each cluster with the max- 
imum intensity was used to calculate the modulation in- 
dex. 

The results are shown in Figure [S] In all panels only 
events with SNR > 5 are shown. The top frame plots in- 
tensity versus DM and time for all events above threshold 
(top) and the e vents that also satis fy mi < 1. (Compare 
to Figure 3 in IDeneva etall 120091) . The strong RRAT 
pulse is clearly visible around t = 100 s, and a weak 
broadband RFI spike occurs near t = 89 s. For the strong 
RRAT pulse we measure SNRt « 20, and for the preced- 
ing weaker pulse, S NRt ~ 6. The differe nce between our 
values and those in IDeneva et al.l (|2009[ ) is likely due to 
the fact that we used the cluster algorithm and they used 
the boxcar smoothing matched filter algorithm. We also 
do not detect the weaker, tailing pulse, likely for the same 
reason. Applying a mi^max = 1 eliminates ~ 99% of the 
events, leaving only the strongest events associated with 
the bright RRAT pulse. 
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Figure 4. Diagnostic plots for tlie simulated data described in 
Section 13.31 Top Frame: Panel (a) shows a grey scale of the 
simulated time-frequency data where the SNRs and widths of 
the signals have been exaggerated for clarity. Panel (b) plots a 
point in the DM-time plane for each event above the intensity 
threshold (SNRnjin = 3). Panel (c) plots 50/m/ for the candi- 
date events in panel (b). Panel (d) plots the SNR of candidates 
over the SNR threshold and below the modulation index threshold 
(™l,max = 3.2). The data were dedispersed over a DM range of 
to 1000 pc cm~^ with ADM = 6 pc cm"'^. The dispersed pulse 
is located at DM = 500 pc cm"'' and at t = 0.5 s. A Gaussian 
RFI spike is located at i = 1.0 s and a frequency of ~ 1428 MHz 
and causes the stripe of events between t ^ 0.9 — 1.0 s. Broadband 
RFI is located at i = 1.5 s and causes the triangle-shaped patch 
of candidate events. Bottom Frame: The top panel plots SNRt 
vs. mi, and the bottom panel is a histogram of modulation in- 
dex. Events associated with the broadband RFI (and thresholded 
noise) are plotted in black, events associated with the Gaussian 
RFI spike are shown in medium gray, and the events associated 
with the dispersed pulse are plotted in light gray. 

The bottom frame shows SNRt vs. mi (top) and a his- 
togram of mi (bottom). In the top panel the events 
associated with the strong RRAT pulse are localized 
along the upper curve extending from SNRt ~ 20 and 
mi w 0.8 to SNRt ~ 5 and mi ~ 3. The lower limit 
of mi w 0.8 is consistent with the value predicted by 
Equation [T51 for SNRt = 20. The curve itself comprises 
spurious events caused by the strong pulse being dedis- 
persed at incorrect dispersion measures and account for 
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Figure 5. Modulation index calculation for the RRAT J1928+15. 
In all frames events with SNR > 5 are shown. Top Frame: Intensity 
vs. DM and time (top) for the events above the SNR threshold and 
the same events (bottom) with an additional modulation index 
threshold (mi max < 1) applied (bottom). The area of each circle 
is proportional to the SNR of the event. (See IDeneva et ani2009l . 
Figure 3.) The RRAT pulse is clearly visible around t = 100 s 
and DM = 242 pc cm~^ and a weak, broadband RFI spike around 
t = 89 s. Bottom Frame: The SNRt vs. mj (top) and a histogram 
of mi (bottom) . The data were dedispersed using ~ 640 trial 
dispersion measures ranging from to 500 pc cm~^. 

about 90% of all events in this figure. As described 
in Section 12. 6[ applying a modulation index cutoff can 
reduce the number of spurious events from real signals 
at incorrect DMs. The broadband RFI spike produced 
the events along the vertical line along SNRt = 5 and 
1.25 < mi < 2.5. The events associated with thresholded 
noise are grouped between 2.5 < m/ < 3.5 and are con- 
sistent with TOi^T = 3.2 for N^, = 256 and SNRmin = 5. 
While applying mj^max — 1 cleanly isolated the bright 
pulse from the RRAT, we did so at the cost of apply- 
ing an effective SNRmin = 10 and thereby removed the 
weaker RRAT pulse. 

3.4.2. Crab giant pulses 

The most well-known source of single, dispersed pulses 
is the Crab pulsar. We reprocessed a 430 MHz data set 



containing t he "supergiant" pul se with a SNRt ~ 1500 
reported in iCordes et al.l ()2004l ). The 140-second long 
dataset was dedispersed using 616 trial dispersion mea- 
sures ranging from DM = — 250 pc cm~^. An intensity 
threshold of SNRmin — 4 was applied to the dedispersed 
time series, and the samples above threshold were run 
through the cluster algorithm. The modulation index 
was calculated for the sample from each cluster that had 
the maximum intensity. This data set proved to be a par- 
ticularly interesting case study due to the large number 
of pulses with a range of SNRs. 

The results of the reprocessing are shown in Figure |6l 
and in all panels only events with SNR > 5 are shown. 
The top frame shows the intensity for the events versus 
DM and time with a circle whose area is proportional to 
the SNR of each event. The top panel shows all events, 
and the bottom panel shows those events that also fall 
below our modulation index cutoff (TOi,max < 2). A train 
of normal giant pulses is clearly visible along the disper- 
sion measure of the Crab, DM = 56.71 pc cm~^, and the 
supergiant pulse is located near t = 50 s. This pulse is 
so bright that it has some of the same characteristics as 
broadband RFI; namely, events occurring at high disper- 
sion measures. But as we saw previously for J1928-I-15, 
applying a mi^max eliminates many of the spurious events 
at incorrect DMs because they have low filling factors. 

The bottom frame plots SNRt vs. nii (top) and a 
histogram of mi (bottom). The events associated with 
the supergiant pulse are shown in medium gray and fall 
along the curve extending from lO"' > SNRt > 100 and 
0.7 < mi < 11 as well as the clump at SNRt ^ 10 and 
mi ~ 15. Note that the largest modulation indices are 
consistent with the value for spiky RFI (mi_s « 21). The 
correctly dedispersed normal giant pulses are shown in 
black and lie along the left edge of the main cluster rang- 
ing from 100 > SNRt > 5 and 0.5 < mi < 3. One excep- 
tion is the single, larger black point at the low-mj end of 
the supergiant pulse curve (medium gray) showing the 
point associated with correct dedispersion value. Not 
surprisingly this event has the largest SNRt and lowest 
mi. The events from incorrectly dedispersing the regular 
giant pulses (light gray) spread to lower SNRt and higher 
mi than their correctly dedispersed counterparts (black). 

< 7 



m/ 



Finally the noise (light gray) falls between 2 < 
and along SNRt ^ 5 and is consistent with mj^x ~ 4.2. 

The measured values of mi for the Crab pulses are sys- 
tematically larger than that predicted by Equation [T3] 
for broadband pulses. The brightest regular giant pulses 
have SNRt « 100 and mj ~ 0.5, but the modulation 
index of a perfectly flat pulse with this SNRt is ~ 0.2. 
This suggests that Crab giant pulses have significant in- 
herent frequency strucutre and non-zero rriA- Most strik- 
ing is that the modulation index for the supergiant pulse 
(mi « 0.7) is larger than the modulation index for the 
strongest regular pulses even though its SNR is two or- 
ders of magnitude larger. The spectrum of the supergiant 
pulse shows signifi cant variation acros s the band due to 
DISS, as shown in ICordes et al.l (|2004[ ). 

The histogram of mi shows the total number of events 
at all DMs (thick line) and the number of events at the 
DM of the Crab (thin line) as an proxy for the number 
of giant pulses. The assumption that any event at the 
DM of the Crab is a giant pulse is simplistic and may 
include false positives. Because brighter giant pulses are 
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Figure 6. Modulation index calculation for the Crab pulsar at 
430 MHz. In all frames only events with SNR > 5 are shown. Top 
Frame: Events above the intensity threshold are plotted against 
DM and time (top) and with the additional constraint mi niax < 2 
(bottom). A stream of Crab giant pulses can be seen in the DM- 
ti me plot at DM=56.7 1 pc cm~^. The supergiant pulse discussed 
in ICordes et al.l 120041) occurs around t = 50 s. Bottom Frame: 
The SNRt vs. m/ is plotted in the top panel. The points associ- 
ated with the supergiant pulse are plotted in medium gray, points 
associated with the dispersion measure of the Crab are shown in 
black, and all other points shown in light gray. A mj histogram 
(bottom) plots the total number of events (thick line) and the num- 
ber of events at DM=56.71 pc cm~'^ (thin line) as a proxy for the 
number of pulses detected. 

rarer than weaker pulses, Equation [13] tells us that the 
number of pulses in a mi bin will decrease as mi — > 0. 
There is therefore a tradeoff between choosing a higher 
'7ii,max to allow for the more common, weaker pulses and 
the false alarm rate. 

Figure [7] further explores the role of interstellar scin- 
tillations on the modulation index. The SNRt, vs. mi 
is plot ted for two higher frequencies from ICordes et al.l 
(|2004D . The data are time- frequency snapshots of pulses 
determined from previous processing at 1475 and 2850 
MHz. The snapshots were only dedispersed at the dis- 
persion measure of the Crab pulsar, the time series were 
thresholded with SNRmin = 3, and the resulting can- 
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Figure 7. SNRt vs m/ o f Crab giant pulses a t 1475 MHz (top) and 
2580 MHz (bottoni) from ICordes et al.l 1)20041 1. The lines represent 
"^l,bb (Equation ll3ll . The modulation index that divides the pulses 
from thresholded noise is mj 2 for 1475 MHz and mi 1.5 for 
2850 MHz. The modulation indices of the thresholded noise are 
consistent with those of broadband pulses. The modulation indices 
of the p ulses are systematically larger than what is predicted by 
Eguation llSI suggesting the spectra have inherent structure caused 
by diffractive interstellar scintillations. 

didates were run through the cluster algorithm to find 
the sample with the maximum SNR. Also plotted is the 
'7ii,bb curve given by Equation 1 131 with = 118 at 1475 
MHz (top) and N„ = 58 at 2850 MHz (bottom). 

In both panels there are two clusters of points: one 
due to thresholded noise and one due to the pulses. The 
thresholded noise lies below mi = 2 and mj = 1.5 at 
1475 and 2850 MHz respectively. The typical values 
for the noise are consistent with mi x and lie along the 
broadband curve as predicted in Section [2?3l The mod- 
ulation indices of the pulses are systematically larger 
than what is predicted, suggesting the pulses have in- 
herent frequency structure and uia 0. The spectra 
for the strongest pulse at each frequency are shown in 
Cordes et al. 2004, Figures 7 and 9, and in both cases 
they exhibit frequency structure caused by DISS. 

4. EXTENSIONS OF THE METHOD 

Our discussion has focused the detection of radio 
bursts using incoherent dedispersion, but the technique 
has more general applicability. It can also be applied 
to other classes of signals (periodic objects like pulsars, 
spectral lines) and other data formats (coherent dedis- 
persion, image cubes from interferometers). 

4.1. Periodic Signals 

Using the modulation index to characterize the fre- 
quency structure of a single pulse can be extended to 
periodic signals. The procedure is best illustrated by 
considering a dedispersed time series I{t, DM) that has 
been folded at some period P. A pulse in this folded time 
series is the sum of iVp pulses 

lpito,DM)^^J2^{t,DM) (21) 
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where tp — to+nP, to is the time sample of the candidate 
pulse in the folded time series, and n = . . . A'^p — 1. The 
modulation index is then calculated for samples where 
Ip{t,DM) is above the SNR threshold. As discussed in 
Section 12.11 for smoothing in time, calculating the sec- 
ond moment requires that spectra are first summed in 
time before averaging over frequency, making it obliga- 
tory that one returns to the time-frequency data 



-Lj2lit,.,DM) 



(22) 



One can also calculate the second moment for the Fourier 
analysis method of finding pulses signals but that is be- 
yond of the scope of this paper. 

The additional computational cost required to calcu- 
late the modulation index can be estimated in the same 
manner as for the single pulse detections described in 
Section 13.21 But as a pulsar is a repetitive source, the 
time span of the data that would need to be reprocessed 
in a second pass would be larger than for a single pulse. 
In the extreme case where Ng — Nt, the processing load 
is comparable when the number of candidates is of the 
order of the number of trial DMs. It is therefore pru- 
dent to reduce the number of candidates by other means 
first before calculating the modulation index of candidate 
pulsar. 

4.2. Coherent Dedispersion 

Coherent dedispersion operates on the basedband volt- 
age directly rather than on intensity-like quantities at the 
output of a spectrometer, which our previous discussion 
has centered on. In a survey, a set of trial values for DM 
would be used, as with post-detection dedispersion de- 
scribed earlier. With the correct value of DM, coherent 
dedispersion restores the phase of the signal to what it 
was prior to alteration by the ISM (so long as multipath 
scattering is not important). The procedure deconvolves 
a phase function from voltage data e(t) by applyin g a 
complex phase filter fe.g.. .Hankin s and Rickettlll975[ ) to 
produce the dedispersed voltage ed{t). The detected sig- 
nal Id{t) = I Ed (OP would then be analyzed with a SNR 
threshold to identify candidate bursts. 

If the true signal is an unresolved pulse with width 
W <C where, as above, B is the total bandwidth, the 
dedispersed output will have a width Wt ~ only if 
the signal is unmodulated across the band. This requires 
that any scintillation modulation have a characteristic 
bandwidth ^ B. In this instance, no new information is 
gained by analyzing the data in the combined frequency- 
time plane. 

For signals that have broader time extent, however, the 
analysis presented in the paper for post-detection dedis- 
persion still applies. The dynamic spectrum would be 
calculated for each burst identified in Id{t) from short 
time blocks of length Atg using the discrete Fourier 
transform (DFT) e{t) of the baseband signal. Us- 
ing an A^i/— point DFT, the uncertainty relation implies 
AtsB/N^ = 1. For burst widths W > At^, the series of 
spectra that span the burst can be used to calculate the 
modulation index across frequency, which is then used to 
classify signals as before. 



4.3. Application to Images Obtained On a 
Frequency- Time Grid 

The modulation index as discussed so far is calculated 
for a set of intensity measurements sampled in time and 
frequency at a particular value of the dispersion mea- 
sure, I{t,i', DM). For signals that are not inherently 
narrow in time (i.e., not pulsed or transient in nature), 
no dedispersion is required. Instead, we simply calculate 
the modulation index mi as the normalized variance of 
the intensity across frequency at a particular time. 



i{tY 



(23) 



The modulation index mj and fractional correlation 
bandwidth FCB calculated in this manner can be used, 
for example, to characterize radio observations of a spec- 
tral line or maser source (which show coherent frequency 
structure) and discriminate them from radio frequency 
interference with a more random frequency structure. 
We discuss the application of a time-domain modulation 
index to these source types in Section [5] 

The method can be further generalized to apply to in- 
terferometric imaging observations of continuum sources 
that have been acquired in multi-channel modes. While 
past imaging observations with the Very Large Array 
(VLA), for example, have only employed a small num- 
ber of channels (e.g., < 32) or even just one channel 
in continuum mode, observations with the Expanded 
Very Large Array (EVLA) or the future SKA pathfind- 
ers such as Australian Square Kilomet re Array (ASKAP ; 
iJohnston et~ani2009| ) and MeerKAT (|Booth et al.ll2009H 
will typically employ many more channels. At the EVLA, 
the wide bandwidth available for continuum imaging 
(e.g., 1-2 GHz or 2-4 GHz) requires the use of many 
spectral channels to avoid chromatic aberration ( "band- 
width smearing"). If we were to require a maximum 
tolerable peak response reduction of 20% at A-array, 1- 
2 GHz, that would necessitate a channel bandwidth Av 
such that 

Av e 

1.5 GHz^ ^ 

where 9/6o is the source offset from the phase tracking 
center in units of the synthesized beamQ To maintain 
sensitivity at that level over just half the primary beam 
field of view, 0/Oq ^ 0.5 x d/D where d and D are the dish 
diameter and array size respectively. For the EVLA in 
A-array where d = 25 m and D « 36 km, this constraint 
requires as many as 720 channels for continuum imaging. 

One of the challenges faced by automated source ex- 
traction pipelines working on these data is to discrim- 
inate between low significance detections of compact 
sources and random intersections of the point spread 
function sidelobes caused by strong sources in the field 
of view. While the exact shape of the sidelobes de- 
pends on the details of the array and the observation, 
they will scale as I{9, v) oc at an angular distance 
from a bright source. Therefore, an intersection of side- 
lobes from different sources will show smooth structure 
in amplitude as a function of channel frequency, while 

^ See, e.g., the EVLA Observational Status Summary, 
|http : //evlaguides . nrao ■ eduT] 
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a true compact source will not. Calculating the modu- 
lation index and FCB will allow us to exploit the extra 
information in the frequency channelization of synthe- 
sis images and add an extra discriminant that is easily 
implementable in automatic pipelines. 

4.4. Realtime RFI excision 

The modulation index could be useful for systems per- 
forming RFI excision on the fly. Interest in realtime inco- 
herent dedispersion transient searches is growing thanks 
to the large speed-up s achieved by gr aphics processing 
units (CPUs) (e.g. Magro et al.ll2011t) . As more of the 
signal processing moves into hardware, realtime RFI re- 
iection is al s o moving into instr uments. For example 
iDelled (|2010( ): lWavth et al.l (|2011| ) have incorporated the 
calculation of kurtosis into the F-stage of the DiFX soft- 
ware correlator, and the modulation index could be cal- 
culated in an identical manner. 

In the time-frequency domain the modulation index 
would be most useful for identifying broadband, impul- 
sive RFI. A sample that had an anonymously low modu- 
lation index could be flagged or blanked by the hardware. 
The modulation index alone should not be used to iden- 
tify narrowband RFI, as a dispersed pulse is narrowband 
before the dispersion is correctly accounted for. 

5. DISCUSSION 

In this paper we have focused our discussion on search- 
ing for radio bursts from or similar to those from pulsars 
and used the modulation index, along with SNR, as a tool 
for RFI excision. More generally the modulation index 
can be one among several statistics used to characterize 
and classify signals rather than to filter candidates. Fur- 
thermore there may be as-yet undiscovered source classes 
having different time-frequency signatures. 

Low mass stars (M-dwarfs and brown dwarfs) are 
frequent eni it ters of radio bursts packson et al.l 119891 : 
lBergeiil200^ . iBastian et all (|1990[ ) have shown that the 
dynamic spectra of dMe flare stars exhibit a rich struc- 
ture in both time and frequency. The spectra of the flares 
they observed had both broadband and narrowband com- 
ponents, as well as large spectral indices (a ~ 10), sug- 
gesting various plasma phenomena at work. A large-scale 
survey of flare stars could use the modulation index as 
one of several parameters used to automatically classify 
flare types. Furthermore, different plasma processes re- 
sult in different levels of polarization, so calculating the 
modulation index of each polarization separately could 
further aid in classifying flares and recognizing RFI. 

The study of radio variability in the solar system is 
a natural extension to flare stars and is another area 
where the modulation index could be a tool for classifying 
different radio bursts. Planetary auroral radio emission 
(ARE) has been det ected for all of the magnetic planets 
in our solar system ()ZarkallT998() . and dynamic spectra 
of ARE show complex substructure in both time and fre- 
quency. A notable class of fast radio pulses is Jupiter's 
"S-bursts" . These bursts have durations of milliseconds 
and are the brightest of Jupiter's ARE emission (Zarka 
Il998| ). Radio discharges from lightning on the gas gi- 
ant planets have durations on the or der of 10 to 100 m s 
and flu xes easily detecta ble at Earth (jZarka et al.ll200^ . 
In fact iRuf et ahl (|2009[ ) have used multi-moment tech- 
niques (total intensity and kurtosis) to look for lightning 



on Mars during a dust storm. By extension it is expected 
that extrasolar planets would have radio emission that 
would be varia ble and in clude bright bursts ()Farrell et al.l 
[1999; Gricfinici er et al.| [2C)07). A study of the modulation 
indices of flares from Jupiter could help identify similar 
flares from exo-Jupiters. 

Extragalactic sources of fast radio transients might in- 
clude merging neutr on star-neutron star or neutr on star- 
black hole binaries (iH ansen and Lvutikovl [200ll ) . Short 
gamma-ray bursts (SGRB) are thought to arise from the 
merge r of two neutron stars, and Pshirkov and Postnovl 
(|2010D suggest that GRB monitors could alert low fre- 
quency radio observatories (e.g., LOFAR) to a possible 
radio transient. 

Our focus has been on signals that are broad in fre- 
quency but narrow in time. The reciprocal problem of 
spectral lines that are steady in time can be handled in 
the same way but with the roles of frequency and time re- 
versed. As an example, surveys for maser lines will most 
likely require a line amplitude that is relatively steady 
over time scales of days or less. Interstellar scintillation 
may induce time variations in some maser sources if they 
are compact enough. SETI (search for extraterrestrial 
intelligence) often postulates narrowband beacon signals 
that are constant in time. Interstellar scintillation will 
certainly induce time variations owin g to the compact 
natur e of any relevant transmitters ( Cordes and Laziol 
119911 ) but with a modulation index ^ 1. The methods 
outlined here apply directly to these problems. 

6. CONCLUSION 

We have discussed how defining detection statistics 
based on higher order moments can improve the success 
of source detection pipelines and focused on the spectral 
modulation index. By calculating the second moment, 
we are able to classify signals based not only on their 
strength but also on their fractional frequency variation. 
By applying prior information about our target sources, 
i.e., that they are broadband, the modulation index eas- 
ily distinguishes between broadband and narrowband sig- 
nals and allows us to filter a large fraction false positives 
due to narrowband RFI. 

These detection statistics (e.g., SNR, mi) are crucial 
to source extraction pipelines because they can be calcu- 
lated automatically. As new observatories generate more 
and more data, it is critical that source extraction occurs 
reliably with minimal human intervention. Although we 
limit ourselves to two statistics in this paper, a pipeline 
could make use of many higher-order statistics (e.g. kur- 
tosis, INita et all [2001 along with a weighted voting 
scheme to classify signals in more sophisticated and nu- 
anced ways. For example the modulation index could 
become one parameter used by detec tion pipeline b ased 
on an artificial neural network (Eatough et al.|[2010t ). 
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